Using of n-grams from morphological tags for fake news classification

نویسندگان

چکیده

Research of the techniques for effective fake news detection has become very needed and attractive. These have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams POS tagging had been proven insufficient classification. However, they did not realise any empirical results, which could confirm these statements experimentally last decade. Considering this contradiction, main aim paper is to evaluate potential common use tags correct classification true news. The dataset published or real about current Covid-19 pandemic was pre-processed using As result, were prepared further analysed. Three based on proposed applied different groups pre-processing phase detection. n-gram size examined as first. Subsequently, most suitable depth decision trees sufficient generalization scoped. Finally, performance measures models compared with standardised reference TF-IDF technique. model like accuracy, precision, recall f1-score are considered, together 10-fold cross-validation Simultaneously, question, whether technique can be improved researched detail. results showed newly comparable traditional At same time, it analysis improve baseline model, precision news, statistically significantly improved.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein classification using modified n-grams and skip-grams.

Motivation Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce...

متن کامل

Tweets Classification using Corpus Dependent Tags, Character and POS N-grams

This paper is part of the Author Profiling task at PAN 2015 contest; in witch participants had to predict the gender, age and personality traits of Twitter users in four different languages (Spanish, English, Italian and Dutch). Our approach takes into account stylistic features represented by character Ngrams and POS N-grams to classify tweets. The main idea of using character Ngrams is to ext...

متن کامل

Enhancing News Articles Clustering using Word N-Grams

In this work we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web using n-grams, extracted from the articles at an offline stage. We compared t...

متن کامل

Biological Named Entity Recognition Using n-grams and Classification Methods

We propose a biological named entity recognition system which uses classification methods and a n-gram model to annotate terms in text. A novel method is presented to express lexical features in a pattern notation. Prefix and suffix characters are used instead of lists of potential terms or other external resources. Creating classification exemplars is conducted from text by using a word n-gram...

متن کامل

Feature Selection on Chinese Text Classification Using Character N-Grams

In this paper, we perform Chinese text classification using n-gram text representation on TanCorp which is a new large corpus special for Chinese text classification more than 14,000 texts divided into 12 classes. We use different n-gram feature (1-, 2-grams or 1-, 2-, 3-grams) to represent documents. Different feature weights (absolute text frequency, relative text frequency, absolute n-gram f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: PeerJ

سال: 2021

ISSN: ['2167-8359']

DOI: https://doi.org/10.7717/peerj-cs.624